fast model
A note on the impossibility of conditional PAC-efficient reasoning in large language models
Large language models have achieved remarkable progress in complex problem-solving, but suffer from high computational costs during deployment (Kwon et al., 2023). To address this, various approaches have been proposed, including model routing (Ong et al., 2025; Dekoninck et al., 2025), speculative decoding (Leviathan et al., 2023), and adaptive reasoning strategies (Snell et al., 2024). Zeng et al. (2025) proposed PAC reasoning, which constructs a composite model ˆ f that selectively switches between an expensive expert model f and a cheaper fast model f while providing statistical guarantees on performance loss. A typical example is the thinking-nonthinking paradigm, where the expert model performs extended chain-of-thought reasoning while the fast model generates direct responses. The original PAC reasoning provides marginal guarantees, controlling the expected risk over the input distribution. A natural extension is whether we can achieve a stronger, conditional guarantee that controls the risk for each input point individually. This is analogous to the notion of object-conditional validity in conformal prediction (Vovk, 2012; Lei and Wasserman, 2014; Lei et al., 2018). 1
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
Proactive Hearing Assistants that Isolate Egocentric Conversations
Hu, Guilin, Itani, Malek, Chen, Tuochao, Gollakota, Shyamnath
We introduce proactive hearing assistants that automatically identify and separate the wearer's conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer's self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement. More information can be found on our website: https://proactivehearing.cs.washington.edu/
- Asia > China > Beijing > Beijing (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
ServeFlow: A Fast-Slow Model Architecture for Network Traffic Analysis
Liu, Shinan, Shaowang, Ted, Wan, Gerry, Chae, Jeewon, Marques, Jonatas, Krishnan, Sanjay, Feamster, Nick
Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 2.7x-136.3x, while the median inter-packet waiting time is often 6-8 orders of magnitude higher than the inference time! ServeFlow is able to make inferences on 76.3% flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (3 more...)
- Information Technology > Security & Privacy (0.93)
- Telecommunications (0.67)
Vertical GaN Diode BV Maximization through Rapid TCAD Simulation and ML-enabled Surrogate Model
Lu, Albert, Marshall, Jordan, Wang, Yifan, Xiao, Ming, Zhang, Yuhao, Wong, Hiu Yung
In this paper, two methodologies are used to speed up the maximization of the breakdown volt-age (BV) of a vertical GaN diode that has a theoretical maximum BV of ~2100V. Firstly, we demonstrated a 5X faster accurate simulation method in Technology Computer-Aided-Design (TCAD). This allows us to find 50% more numbers of high BV (>1400V) designs at a given simulation time. Secondly, a machine learning (ML) model is developed using TCAD-generated data and used as a surrogate model for differential evolution optimization. It can inversely design an out-of-the-training-range structure with BV as high as 1887V (89% of the ideal case) compared to ~1100V designed with human domain expertise.
- North America > United States > Virginia (0.04)
- Asia > Middle East > Jordan (0.04)
The first AI universe sim is fast and accurate--and its creators don't know how it works
For the first time, astrophysicists have used artificial intelligence techniques to generate complex 3-D simulations of the universe. The results are so fast, accurate and robust that even the creators aren't sure how it all works. "We can run these simulations in a few milliseconds, while other'fast' simulations take a couple of minutes," says study co-author Shirley Ho, a group leader at the Flatiron Institute's Center for Computational Astrophysics in New York City and an adjunct professor at Carnegie Mellon University. The speed and accuracy of the project, called the Deep Density Displacement Model, or D3M for short, wasn't the biggest surprise to the researchers. The real shock was that D3M could accurately simulate how the universe would look if certain parameters were tweaked--such as how much of the cosmos is dark matter--even though the model had never received any training data where those parameters varied.
- North America > United States > New York (0.25)
- North America > United States > California > Alameda County > Berkeley (0.05)
- North America > Canada > British Columbia (0.05)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
The first AI universe sim is fast and accurate -- and its creators don't know how it works
For the first time, astrophysicists have used artificial intelligence techniques to generate complex 3D simulations of the universe. The results are so fast, accurate and robust that even the creators aren't sure how it all works. "We can run these simulations in a few milliseconds, while other'fast' simulations take a couple of minutes," says study co-author Shirley Ho, a group leader at the Flatiron Institute's Center for Computational Astrophysics in New York City and an adjunct professor at Carnegie Mellon University. The speed and accuracy of the project, called the Deep Density Displacement Model, or D3M for short, wasn't the biggest surprise to the researchers. The real shock was that D3M could accurately simulate how the universe would look if certain parameters were tweaked -- such as how much of the cosmos is dark matter -- even though the model had never received any training data where those parameters varied.
- North America > United States > New York (0.25)
- North America > United States > California > Alameda County > Berkeley (0.05)
- North America > Canada > British Columbia (0.05)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
The first AI universe sim is fast and accurate -- and its creators don't know how it works
For the first time, astrophysicists have used artificial intelligence techniques to generate complex 3D simulations of the universe. The results are so fast, accurate and robust that even the creators aren't sure how it all works. "We can run these simulations in a few milliseconds, while other'fast' simulations take a couple of minutes," says study co-author Shirley Ho, a group leader at the Flatiron Institute's Center for Computational Astrophysics in New York City and an adjunct professor at Carnegie Mellon University. The speed and accuracy of the project, called the Deep Density Displacement Model, or D3M for short, wasn't the biggest surprise to the researchers. The real shock was that D3M could accurately simulate how the universe would look if certain parameters were tweaked -- such as how much of the cosmos is dark matter -- even though the model had never received any training data where those parameters varied.
- North America > United States > New York (0.25)
- North America > United States > California > Alameda County > Berkeley (0.05)
- North America > Canada > British Columbia (0.05)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Advances in few-shot learning: reproducing results in PyTorch
Few-shot learning is an exciting field of machine learning which aims to close the gap between machine and human in the challenging task of learning from few examples. In my previous post I provided a high level summary of three cutting edge papers in few-shot learning -- I assume you've either read that, are already familiar with these papers or are in the process of reproducing them yourself. In this post I will guide you through my experience in reproducing the results of these papers on the Omniglot and miniImageNet datasets, including some of the pitfalls and stumbling blocks on the way. Each paper has its own section in which I provide a Github gist with PyTorch code to perform a single parameter update on the model described by the paper. To train the model just have to put that function inside a loop over the training data.